Pattern Based Comprehensive Urdu Stemmer and Short Text Classification
نویسندگان
چکیده
منابع مشابه
Rule Based Urdu Stemmer
This paper presents Rule based Urdu Stemmer. In this technique rules are applied to remove suffix and prefix from the inflected words. Urdu is well spoken language all over the world but less work has been done on Urdu stemming. Stemmer helps us to find the root of the inflected word. Various possibilities of inflected words like ںو (vao+noon-gunna), ے (badi-ye), ںای (choti-ye+alif+noon-gunna) ...
متن کاملAssas-band, an Affix-Exception-List Based Urdu Stemmer
Both Inflectional and derivational morphology lead to multiple surface forms of a word. Stemming reduces these forms back to its stem or root, and is a very useful tool for many applications. There has not been any work reported on Urdu stemming. The current work develops an Urdu stemmer or Assas-Band and improves the performance using more precise affix based exception lists, instead of the co...
متن کاملChallenges in Developing a Rule based Urdu Stemmer
Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. In this language, morphological processing becomes particularly important for Information Retrieval (IR). The core tool of IR is a Stemmer which reduces a word to its stem form. Due to the diverse nature of Urdu, developing stemmer is a challenging task. In Urdu, there are large numb...
متن کاملShort Text Classification Based on Improved ITC
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selec...
متن کاملAn Efficient Method for Urdu Language Text Search in Image Based Urdu Text
This paper describes an efficient method for Urdu text search in computer generated and handwritten scanned images. An efficient text search technology is necessary because of increasing handled document every day. This method is unique and simple in the sense that no features are extracted. The proposed method is script independent. The input image is directly matched with a set of prototype c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2018
ISSN: 2169-3536
DOI: 10.1109/access.2017.2787798